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Abstract 

In this paper, we consider binary response models with linear quantile restrictions. Consid- 
erably generalizing previous research on this topic, our analysis focuses on an infinite collection 
of quantile estimators. We derive a uniform linearization for the properly standardized empir- 
ical quantile process and discover some surprising differences with the setting of continuously 
observed responses. Moreover, we show that considering quantile processes provides an effective 
way of estimating binary choice probabilities without restrictive assumptions on the form of 
the link function, heteroskedasticity or the need for high dimensional non-parametric smooth- 
ing necessary for approaches available so far. A uniform linear representation and results on 
asymptotic normality are provided, and the connection to rearrangements is discussed. 

1 Introduction 

In various situations in daily life, individuals are faced with making a decision that can be described 
by a binary variable. Examples relevant to various fields of economics include the decision to 
participate in the labour market, to retire, to make a major purchase. From an econometric point 
of view, such decisions can be modelled by a binary response variable Y = I{Y* > 0} that depends 
on an unobserved continuous random variable Y* which summarizes an individuals preferences. In 
the presence of covariates, say W, a natural question is: what can we infer about the distribution 
of the unobserved v ariable Y* con ditional on W from observations of i.i.d. replicates of (Y,W). 



In a seminal paper, iManskil (|1975l ) assumed that Y* = W (3 + e where the 'error' e satisfies the 



conditional median restriction P(e < 0|VF = w) = 0.5 and derived conditions on the dist ribution 
of (e, W) that imply identifiability of the coefficient vector f3 up to scale. In later work, 



Manski 



(|1985l ) extended those results to general quantile restrictions of the form P(e < 0\ W = w) = r fo r 



fixed r G (0, 1). A more detailed discussion of identification issues was provided in IManskil ([1988). 



Due to their importance in understanding binary decisions, binary choice models have ever since 



"The idea of considering binary response quantile processes originated from discussions with Prof. Roger Koenker. 
I am thankful to him for the encouragement and many insightful discussions on this topic. The mistakes are of course 
my sole responsibility. This research was conducted while I was a visiting scholar at UIUC. I am very grateful to the 
Statistics and Economics departments for their hospitality. Financial support from the DFG (grant V01799/1-1) is 
gratefully acknowledged. 



1 



aroused a lot of i nterest and many e s timation proced u res have been proposed [see 



Kordas (2006) and 



6, 



Khan 



Cosslett 



Horowitz! (11992! ) . IPowell et alJ (198a ) . llchimural (ll993l ). lKlein and Spadvl (|l993l ). ICoppejansl ( 



1983), 

), 



2001 



(|2013l ) to name just a few]. 
A particularly challenging part of analysing binary response models lies in understanding the 
stochastic properties of corresp onding estimation proced ures. The asymptotic distribution of Man- 



ski's estimator was derived in 



non-standard case was considered in 



Kim and Pollard 



(jl990i ) under fa irly general cond i tions, while a 



Portnovl (|1998l ). In particular. iKim and Pollard! (|1990l ) demon- 



strated that the convergence rate is n _1//3 and that the limiting distribution is non-Gaussian. A 
different approach based on non-parametri c smoothing tha t avoids some of the difficulties encoun- 
tered by Mansk i's estimator was taken by iHorowita (|1992l ). By smoothing the objective function, 



Horowitzl (|1992l ) obtained both better rates of convergence and a normal limiting distribution. 



However, note that the smoothness conditions on the underlying model are stronger than those of 



Kim and Pollard! (|1990l ). 



The approaches of Manski and Horowitz have in common that only estimators for the coefficient 
vector (3 are provided. While those coefficients are of interest and can provide valuable structural 
information, their interpretation can be quite difficult since the scale of (5 is not identifiable from 
the observations. On the other hand, the 'binary choice probabilities' p w := P(Y = 1\W = w) 
provide a much simpler and more straightforward interpretation. 

Most of the available methods for estimating binary choice probabilities are of two basic types. The 
first and more thoroughly studied approach is to assume a model of the fo rm Y * = W?0 + £ i whe re 
the £i are assumed to be either independent of JQ [see ICosslettl (119831) and ICoppejansI (|200ll )]. 
or admit a very special kind of heteroskedasticity [Klein and Spadvl ()1993l )]. Another popular 



Powell et al. 


(1989 


) or 


Ichimura 


(1993) 



between e and the covariate W. 

While in the settings described above it is possible to obtain parametric rates of convergence for 
the coefficient vector f3 and also construct estimators for choice probabilities, in many cases the 
assumptions on the underlying model structure seem too restrictive. 

An alternative approach allowing for general forms of heteroskedasticity was recently investigated 
by iKhanl (120131 ). who proved that under general smoothness conditions any binary response model 
with Y* = Wfp + Ei is observationally equivalent to a Probit/Logit model with multiplicative 
heteroskedasticity, that i s a mo del where £j = ao(Wi)Ui with Ui independent of Wi and general 
scale function <jq. IKhanl (|2013l ) also proposed to simultaneously estimate /3 and the function <7o 
by a semi-parametric sieve approach. The resulting model allows one to obtain an estimator of 
the binary choice probabilities. While this idea is extremely interesting, it effectively requires 
estimation of a (i-dimensional function in a non-parametric fashion. For the purpose of estimating 
/3, the function <jq can be viewed as nuisance parameter and its estimation does not have an impact 
on the rate at which /3 is estimable. However, the binary choice probabilities explicitly depend on 
Co and can thus only be estimated at the corresponding d-dimensional non-parametric rate. In 
settings where d is moderately large this can be quite problematic. 

In t he classical setting where re sponses are observed completely, linear quantile regression models 



sec 



Koenker and Bassettl (jl978l )] have proved useful in providing a model that can incorporate gen- 
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eral forms heteroskedasticity and at the same time avoid non-parametric smoothing. In particular, 
by looking at a collection of quantile coefficients indexed by the quantile level r it is possible to 
obtain a broad picture of the conditional distribution of the response given the covariates. The aim 
of the present paper is to carry this approach into the setting of binary response models. In con- 
trast to existing methods, we can on one hand allow for rather general forms of heteroskedasticity 
and at the same time estimate binary choice probabilities without the need of non-parametrically 
estimating a d-dimensional function. 

The ideas explored here are closely related to the work of 



Korda; 



12006). Yet, there are many im- 



portant differences. First, in his theoretical investigations, iKordasI ([2006) considered only a finite 
collection of quantile levels. The present paper aims at considering the quantile pr ocess. Contrar y 
to the classical setting, and also contrary to the results suggested by the analysis in IKordasI (|2006l ) , 
we see that the asymptotic distribution is a white noise type process with limiting distributions 
corresponding to different quantile levels being independent. An intuitive explanation of this seem- 
ingly surprising fact along with rigorous theoretical results can be fo und in Section We thus 
provide both a correction and considerable extension of the findings in IKordasI ([2006). 
Further, our results on the quantile process pave the way to obtaining an estimator for the condi- 
ti onal probabiliti es p w and derive its asymptotic representation. While a related idea was considered 



m 



Kordasl (|2006l ) . no theoretical justification of its validity was provided. Moreover, we are able 



to considerably relax the identifiability assumptions that were implicitly made in th ere. Finally, 



we de mo nstrate that our ideas are cl osely related to the concept of rearrangement [see iDette et al 
(2006) or 



Chernozhukov et al. 



( 2010 )] and provide new theoretical insights regarding certain prop- 
erties of the rearrangement map that seem to be of independent interest. 

The rest of the paper is organized as follows. In Section [21 we formally state the model and provide 
results on uniform consistency and a uniform linearization of the binary response quantile process. 
All results hold uniformly over an infinite collection of quantiles T. In Section [3l we show how the 
results from Section [2] can be used to obtain estimators of choice probabilities. We elaborate on 
the connection of this approach to rearrangements. Finally, a uniform asymptotic representation 
for a properly rescaled version of the proposed estimators is provided and their joint asymptotic 
distribution is briefly discussed. All proofs are deferred to an appendix. 



2 Estimating the coefficients 



Before we proceed to state our results, let us briefly recall some basic facts about identific ation 
in binary respo nse models and provide some intuition for the estimators of iManskil (|1975l ) and 
Horowitz! (|1992l ). Assume that we have n i.i.d. replicates, say (Yi,Wi)i—i n , drawn from the 
distribution (Y = I{Y* > 0},W) with Y* denoting the unobserved variable of interest and W 
denoting a vector of covariates. Further, denote by q T (w) the conditional quantile function of Y* 
given W = w and assume that for r G T C [0,1] we have q T (w) = w T fi T for some vectors /3 T . 
Observing that I{Y* > 0} = I{aY* > 0} with a > arbitrary directly shows that the scale of 
the vector /3 T can not be identified from (Y, W). On the other hand, the vector j3 T is identified up 
to scale if for example b ^ f3 T implies that the distribution of Y conditional on w T /3 T > differs 
from that conditional on w T b > on a sufficiently large set. More precisely, assume that the 
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function u h-> Fy*\w( wT @t + u\w) is strictly increasing for u in a neighbourhood of zero and all 
w £ support (W). In that case we have by the definition of the r'th quantile 



P(Y = 1\W 



w) < 



> 1 - r, if w T /3 T > 



1 — r, if w T (3 T 







< 1 - r, if w T /3 r < 0. 



This already suggests that the expectation of (Y — (1 — r)) is positive for W T j3 T > and negative 
for W T (3 T < 0. We thus expect that under appropriate conditions the function 

S T (/3) :=E\(Y - (1 - t))I{W t (3 > 0} 

should be maximal at aj3 T for any a > 0. Consider a vector 6 £ R d+1 . Then 

5 r (6)-5 r (/3 T ) = J D 1 (6,r) + J D 2 (6,r) 

with 

DxQ>,t) := E[(y-(l-r))/{W T 6>0> W T /3 T }], 
D 2 (6,r) := -E[(Y - (1 - r))/{VF T 6 < < W T p T }}. 

Note that both quantities are non-positive, and at least one of them being strictly negative is 
sufficient for inferring /3 T /||/3 T || ^ from the obser vable data . An o verview and more detailed 

discussion of related results is provided in C hapter 4 of iHorowita (|2009l ). 

A common assumption [see e.g. Chapter 4 in Irlorowit J (j2009l )] is that one component of f3 T is either 
constant or at least bounded away from zero. Without loss of generality, we assume that this holds 
for the first component of j3 T . In order to simplify the notation of what follows, write the covariate 
W in the form W = (Z, X) with Z being the first component of W and X denoting the remaining 
components. Denote the supports of X,Z,W by X,Z,W, respectively. Denote by (Yi,Wi)f = ^ a 
sample of i.i.d. realizations of the random variable (Y,W). Define the empirical counterpart of S T 
by 

1 n 

n 



and consider a smoothed version 



i=l 



with h n denoting a bandwidth param eter and K,(u) : = K{v)dv a smoothed version of the 
indicator function I{u < 0}. Following Horowitz ( 20091 ) . define the estimator (s,b T ) through 



T\T\ 



(s r ,6 T ) = argmax s=±1)b6K dS' n)T ((s, b ) 

Remark 2.1 The proofs of all subsequent results implicitly rely on the fact that we know which 
coefficient stays away from zero and that the covariate corresponding to this particular coefficient 
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has a 'nice' distribution cond itional on all oth er c oefficients [see assumptions (Fl), (D2) etc.]. This 
is in line with the approach of lHorowita (|1992l ) and lKordad (|2006l ) and makes sense in many practical 
examples. Results similar to the ones presented below might continue to hold if we use Manski's 
normalization \\f3\\ = 1 instead of setting the 'right' component to dbl. However, the asymptotic 
representation would be somewhat more complicated. For this reason, we leave this interesting 
question to future research. 

In all of the subsequent developments we make the following basic assumption. 

(A) The coefficient f} T \ satisfies inf rg r \Pri\ > an d the coefficient fi r \ has the same sign on all 
of T. In what follows, denote this sign by sp- 

Remark 2.2 Note that due to the scaling the estimator b defined above is an estimator of the 
re-scaled quantity b T := 6 T /|/3 T) i| where b T := {fi T ,ii ■—, @T,d+i) T ■ When interpreting the estimator 
b T , this must be taken into account. In particular, b T can not be interpreted as classical quantile 
regression coefficient. This also explains the reason behind assumption (A). 

In order to establish uniform consistency of the smoothed maximum score estimator, we need the 
following assumptions. 

(Kl) The function /C is uniformly bounded and satisfies sup|„| >c \K{v) — I{v > 0}| — > as c — > oo. 

(Fl) The conditional distribution function of Z given X, say F z \ x , is uniformly continuous uni- 
formly over x G X , that is 



sup sup sup \F z \x{v\x) — F z \ x {v + u\x) 

xeX v€R \u\<8 



as 5 







(Dl) For any fixed r £ T, f} T = (so,b T ) is the unique minimizer of S T (f3) on {—1,1} x M. d and 
additionally 



d(a) := sup \e\ inf inf \S T W) - S T (B T )\ > a} -»■ 

L I r£T ||/3-^ T ||>e,|/3i|=l ' J 



as a — > 



In order to intuitively understand the meaning of condition (Dl) above, note that conditions (Kl) 
and (Fl) imply that S n ^ T {j3) — > S T {j3) uniformly in r, [3. Condition (Dl) essentially requires that the 
maximum of S T ({3) is 'well separated' uniformly in t, which allows to obtain uniform consistency of 
a sequence of maximizers of any function that uniformly converges to S T . Versions of this condition 
that are directly connected to densities and distributions of so me of the regre ssors can for example 



be derived by considering a uniform version of Assumption 2 in 



to the ones given in that paper, see also Assumptions 1-3 in iHorowita (jl992l ). 



Manski 



(198 



3) by 



arguments similar 



Lemma 2.3 Under assumptions (Kl), (Dl), (Fl) let h n — > 0. Then the estimator (s,b T ) is weakly 
uniformly consistent, that is 



sup ||(s T A) - (soA)IU = °p{ 1 )- 

reT 
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The next collection of assumptions is sufficient for deriving a uniform linearization [some kind of 
'Bahadur-representation'] for b T . Assume that there exists some rj > such that the following 
conditions hold. 



(K2) The function fC is two times continuously differentiable and its second derivative is uniformly 
Holder continuous of order 7 > 0, that is it satisfies 

sup \K"{x)-K"{y)\<C K \x-y\\ 

\x-y\<5 

Denote the derivative of fC by K . Assume that additionally, K, K' are uniformly bounded and 
we have J \v 2 K'{v)\dv < 00, f \K'{y )\ 2 dv < 00 and additionally a n := h^ 1 J, vh >}? \K'{v)\dv = 
o(l). 

(K3) f\v k K(v)\ < 00 for some k > 2 and J v^K(v)dv = for 1 < j < k and additionally 
I\vh n \> v \K(v)\dv = o{h k n ) as well as sup| a | >a . \K{a)\ = o(x~ l ) as x -> 00. 

(B) The bandwidth h n satisfies h n = o(l) and additionally (n/i^) _1/2 (log re) 2 = o(l). 

(D2) The distribution of X has bounded support X. For almost every x G X, the covariate Z has 
a conditional density fz\x('\ x )- 

(D3) For any vector b with [|6 — b T \\ < r\ the two functions u h-> fz\xi s o(—x T b T + ii)|x) and 
u 1 — ^ i ? y*|X,^(0| a: ) s o( — a^ T ^r + w)) are two times continuously differentiable at every u G f7jj(0) 
for almost every x £ X and the first and second derivatives are uniformly bounded [uniformly 
over x £ X,t G T]. 

(D4) The function u 1— > fz\x( s o(—x T b T +u)\x) is k—1 times continuously differentiable for every x G 
X at every u with |u| < rj. All derivatives are uniformly bounded and uniformly continuous 
uniformly in x, r. The function u 1— > Fy*\x,z(^\ x i & ~ 0{—x T b T + u)) is k times continuously 
differentiable at every u with |u| < rj at almost every x £ X and all derivatives are uniformly 
bounded and uniformly continuous uniformly in x, r. 

(D5) The map r 1— > j3 T is uniformly on T Holder continuous of order 7 > 0, that is sup TjT / eT \ t —t>\<& \\t 
Pt'W — C^ 7 f° r some universal constant C, some 7 > and all 5 < Sq with 5$ > 0. 



(D6) For any 77 7^ T2 with ri,T2 G T there exists e > such that P(|| W T (f3 T1 — f}. 



j T2j 



< e 



0. 



(Q) We have inf rg T | A max (Q(so 5 ^t, t))| > where A max (^4) denotes the largest eigenvalue of the 
matrix A and we defined 

0^ ((t " ^y*|x,z(0|x, s(-x T 6 + n)))/ z , x ( S (-x T 6 + \^ =Q Xx T dP x (x). 
The conditions on the kernel function K are standard in t he binary response setting and were for 



example considered in iHorowita (|2009 ) and iKordasI (120061 ) . Assumptions (D2)-(D4) and (Q) are 



uniform versions of the conditions in IHorowita (|1992l ) and are needed to obtain results holding 
uniformly in an infinite collection of quantiles. Condition (D5) is needed to obtain a rate in the 
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uniform representation below. Condition (D6) implies asymptotic independence of the limiting 
variables corresponding to different quantile levels. Essentially, it states that quantile curves cor- 
responding to different quantile levels should be 'uniformly separated' which is reasonable in most 
applications. In particular, (D6) follows if the conditional density fY*\w{v\ w ) °f ^* given W is 
uniformly bounded away from zero for all (y,w) with Fy*\w{y\ w ) £ T and w in the support of W. 

Remark 2.4 Some straightforward calculations show that under assumption (D3) and the bound- 
edness of the support of Z the matrix Q(sq, b T , r) in condition (Q) is the second derivative of the 
function b i— > S T ((so,b T ) T ) evaluated in b T . Since S is assumed to be maximal in this point, the 
matrix Q(sq, b T ,T~) is negative definite, and thus we need to bound its largest eigenvalue away from 
zero in order to obtain a uniform version of the non-singularity of Q. 

Theorem 2.5 Under assumptions (A), (B), (D1)-(D5), (Fl), (K1)-(K3), we have 

Q(s ,b T ,T)(b T - b T ) = - f n (s ,b T ,T) + i?„(r) 

where 



(2.1) 



f n (s,b,r) :-- 



sup ||-Rn(r) 



dS n (s,b,r) 
db 



1 n 



1 = 1 



X[b + sZj 



Op(K n ) := Op [{h k n + (nh n ) 1 I 2 log n)((n/4)~ 1/2 log n + K + a n ) 



In particular, K n = o(h^ + (n/i n ) -1 / 2 ) and thus negligible compared to T n (so,b T ,T). 

Now assume that additionally condition (D6) holds. Then, for any finite collection T%,...,Tk € T 

we obtain 



nh n ( b Tj - b Tj 



T n (s Q ,b T .,Tj) 



v 



j=i,...,k 



(QrAs Q ,b T Tj^Mr 



(2.2) 



where 



T n (s,b,r) 



If 

">n 
kl 



v h K(v)dv / g k (s,b,x)xdP x (x), 



9j(s,x,b) := -^—^(T-F Y *\ x ,z(0\x,s(-x T b + u)))f zlx (s(-x T b + u)\x) 
M T . , M T . are independent for j ^ i and 

M T ~ AA(0,S r ). 

where 



u=0 



S r := r(l — t) J K (u)du J xx fz\x{~ s o x b T \x)dPx{x). 

Compared to the results available in the literature [e.g. in Kordas (j20od ) and Horowitz ( 2009 )]. 
the preceding theorem provides two important new insights. To the best of our knowledge, it is 
the first time that the estimator is simultaneously considered at an infinite collection of quantiles. 
Equally importantly, it demonstrates that the joint asymp totic di s tribut ion of several quantiles 
differs substantially from what both intuition and results in iKordasI ( 2006 ) seem to suggest. 
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Remark 2.6 In contrast to the 'classical' case, the properly normalized quantile process at differ- 
ent quantile levels converges to independent random variables. An intuitive explanation for this 
surprising fact can be obtained from the asymptotic linerization in (|2.ip . For simplicity, assume 
that the kernel K has compact support, say [—1,1]. Then all observations that have a non-zero 
contribution to T n (sQ,b T: T) will need to satisfy iW^S-rl < h n \j3 T ^\. In particular, letting h n — > 
implies that asymptotically for different values of r disjoint sets of observations will be driving the 
distribution of T n . Similar phenomena can be observed in other settings that include non-parametric 
smoothing, a classica l exam ple being density estimation. Note th at regard i ng th is particular point 
the paper of iKordad (|2006l ) contained a mistake. More precisely, iKordasI (|2006l ) claimed that the 
asymptotic distributions corresponding to different quantiles have a non-trivial covariance which is 
not the case. 



In particular, the above findings imply that there can be no weak convergence of the normalized 
process (^\/nh n (h T — b T — T n (so, 6 T , t)^ in a reasonable functional sense since the candidate 
'limiting process' has a 'white noise' structure and is not tight. This will present an additional 
challenge for the analysis of estimators for binary choice probabilities constructed in the following 
section. 



3 Estimating conditional probabilities 

Partially due to the lack of complete identification, the coefficients estimated in the preceding 
section might be hard to interpret. A more tractable quantity is given by the conditional probability 
p w := P(Y = 1\W = w). One possible way to estimate this probability would be local averaging. 
However, due to the curse of dimensionality, this becomes impractical if the length of W exceeds 2 
or 3. An alternative is to assume that the linear model q T {w) = w T f3 T holds for all r E T C [0, 1]. By 
definition of Y = I{Y* > 0}, the existence of r w 6 T with w t (3 Tw = implies that p w = 1 — t w . On 
the other hand, the quantile function of Y* is given by w T /3 T and thus P(Y* < 0\W = w) = w T /3 Tw . 
By definition of the quantile function and the assumptions on Y*,t <t w ^ w T (3 T < w T f3 Tw . This 
implies the equality t w = I{w T (3 T < 0}dr. In particular, we have for any (a, b) C T with 
a < t w < b 

rl pb pb 

p w = I{w T (3 T > 0}dr = l-b+ / I{w T f3 T > 0}dr = 1 - b + / I{w T f3 T > 0}dr 

JO J a J a 

This suggests to estimate p w by replacing /3 T in the above representation with the estimator /3 T from 
the preceding section after choosing (a, b) in some sensible manner. The fact that (3 is an estimator 
of the re-scaled version f3 T is not important here since multiplication by a positive number does not 
affect the inequality w T (3 T > 0. From here on, define 

r b 

p w (a,b) := 1 - b+ / I{w T /3 T > 0}dr. (3.1) 

J a 

This also indicates that in order to estimate p w we do not need the linear model q T (w) = w T j3 T to 
hold globally and also do not require that f3 T can be estimated for all r G T. In fact, the validity 
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of the linear model q T (w) = w 1 '/3 T for r in a neighbourhood of t w and estimability of (3 T on this 
region is sufficient for the asymptotic developments provided below. 

Remark 3.1 Assume that the estimator /3 T is uniformly consistent and that additionally for any 
5 > there exists eg > such that inf| u „ Tiu | > 5 \ w I \j3 u — (3 Tw )\ > £s- Then uniform consistency of /3 T 
will directly yield that, with probability tending to one, p w (a, b) = p w (a' , b') as long as a < r w < b 
and a! < t w < b' . This suggests that from an asymptotic point of view the choice of a,b in the 
estimator p w (a, b) is not very critical. On the other hand, t w is unknown in practice. Thus choosing 
a, b as small and large as the data allow, respectively, seems to be a sensible practical approach. 
At the same time, identifiability at infinity is not needed to obtain an estimator of probabilities for 
points that are bounded away from the boundary of the covariate space. 



Remark 3.2 The definition of p w is closely connected to the concept of rearrangement [see lHardv et al 
(|1988l )]. More precisely, recall that the monotone rearrangement $ of a function g : [0, 1] — >• M is 
defined as 



i{g{ u ) < v}du 



where denotes the generalized inverse of the function u \-> ^? g (u) and the first step of the 
rearrangement, is the distribution function of g with respect to Lebesgue measure. Thus we can 
interpret the integral J I{w T j3 T < 0}dr in the definition of t w as the distribution function of the 
ma p r i— > w T T . Previo u sly, a smoothed version of the first step of the rearrangement was used 
by iDette and Volgushevl (|2008l ) to invert a non-increasing estimator of an increasing function in 
the setting of quantile regression. On the other hand, it is not obvious if the function r i— > w T fi T 
is increasing since (3 T is s re-scaled version of the quantile coefficient /3 r . However, as we already 
pointed out in Remark 13. 11 the function r i— > w T /3 T will still have a unique zero. As we shall argue 
next, the first step of the rearrangement map can provide a way to estimate this zero point in a 
sensible way. 



The pr operties of t h e rea rrangement viewed as mapping between f unction spaces were c onsid - 
ered in IDette et al.l (|2006l ) for estimating a monotone function and IChernozhukov et al.l (|201Cl ) 
for monotonizing crossing quantile curves. In particular, the last-named authors derived a kind 
of compact differentiability of the rearrangement mapping at functions that are not necessar - 
ily i ncreasing. However, t hose r esults can not be directly applied here since IDette et al.l (|2006l ) 
a nd IDette and Volgushevl (|2008l ) applied smoothing while the compact differentiability result of 



Chernozhukov et al 



(|2010l ) requires a process based functional central limit theorem. Due to the 
asymptotic independence of the limiting distributions in Theorem 12.51 such a result is impossible 
in our setting. Still, a general analysis of the rearrangement map is possible and will be presented 
next. The crucial insight is that the process r h-> (n/i n ) 1//2 (/3 T — j3 T ) is still sufficiently smooth on do- 
mains of size decreasing at the rate h n while its convergence to the limit takes place at a faster rate. 



We begin by stating a general result that allows to derive a uniform linearization of the map 
\I/ defined above. In situations where a functional central limit result does not hold (this will often 
be the case in the situation of estimators build from local windows), this result seems to be of 
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independent interest. In particular, it can be used to derive a uniform Bahadur representation for 
the estimator p w in the present setting. 

Theorem 3.3 Consider a collection of functions g q : [0,1] — > R indexed by a general set Q and 
assume that for all q G Q there exists uq^ G (0, 1) with g q (uQ tq ) = 0. Additionally, assume that each 
g q is continuously differentiable in a neighbourhood U${uq s ) C (0, l)Vq G Q and that its derivative 
is uniformly Holder continuous of order 7 with constant Ch both not depending on q, and that for 
any e > we have inf g inf \ u _ uo q \ >E \g q {u) - g q (u 0<q )\ > and mi q g' q (u , q ) > 0. 
Denote by g n<q : [0, 1] — > R a collection of estimators for g q . Assume that 

SUp SUp \g n ,q(u0,q) ~ 9n,q(u) - {g q (u 0>q ) ~ 9q(u))\ = Op(£ n (e n )), (3.2) 
q&Q \u— «o, 9 |<£n 



that 



and that 



sup sup \g n ,q(u) - g q (u)\ = op(l) (3.3) 
9 ue[o,i] 



sup \g n ,q(u) - g q (u)\ = Op(Rn). (3.4) 

\u — UQ ; q\<.8 



If for all c n = o{C n ) with some given C n — > oo we have £, n (RnC n ) = o(R n ) it follows that for any 
collection of points a q , b q G (0, 1) with a q + e < u q < b q — e with e > fixed we have with probability 
tending to one 

^n,,(0) = a q + I " I{g n , q (u) < 0}du Vq£Q (3.5) 

J a„ 



and 



sup 

qeQ 



* S9 (0)-* S „„(0)| =0 P (Rl^ + URn+))- (3.6) 
where f(x+) := limg^o f(x + e). 



We now state the additional assumptions that are needed to derive the limiting distribution of p w . 
Assume that for some 5 > the conditions of Theorem 12.51 hold on the set T s := \pL — 8, tu + d] 
with T := \pL,tu] C (<5, 1 — 5). For this T, we will need the following conditions. 

(Tl) Define the set W T s := {w G W\ 3t w G T 5 : w T (3 Tw = 0}. Assume that for every w G W T s there 
exists a unique t w G T with w T (3 Tw = 0. Assume that the function r i— > (3 T is continuously 
differentiable on T s , that its derivative, say A r , is uniformly Holder continuous of order 7 > 
and that Lt ■= inf r6T « \ w T A Tw \ > 0. 

(T2) The function r 1— > Q(so,b T ,T) is Holder continuous of order 7 > uniformly on T 5 . 

(K4) We have sup| x |> c |i^'(x)| < c _1 / 2_e for some e > and all c > cq. 

The above conditions ensure that the collection of estimators g n ,w{ T ) '■= w 1 p T satisfies the condi- 
tions of Theorem 13.31 [sec Lemma [4.41 for condition (|3.2|) ]. An application of this result thus directly 
yields the following result. 
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Theorem 3.4 Assume that for some 5 > the conditions of Theorem \2. 5\ hold on the set T s := 
[tL — 5, tjj + 5] withT := [ti,tu] C (5,1 — 5) and let conditions (Tl), (T2), (K4) hold. Assume that 
for each w G Wt we have (a w , b w ) C T s . Then for any w G Wt ■= {w G W| 3t w G T : w T j3 Tw = 0} 

Pw(a w ,b w ) -p w = -w T Tw - ^ Tu ,)|ui T A T „| _1 + R { n\w) 



with 



1 



(logn) 1+ ^ (logn) 2 u fc 

+ ; , o,-,/o I I + op(h n ) + Op(K n ) 



where n n was defined in Theorem \2.h\ In particular, the remainder is negligible compared to 
-w T (/3 Tw - /3 Tu ,)|ui T A T J -1 . Moreover, for any finite collection w\, ...,w^ G Wt with wj' — (zj,Xj) 
we obtain 



nh n p w . - p w . + \w A Tw | x • T n (s Q , b T , t u 



v 



(w T ^r Wi r^jiQr^ (S , b , T Wj ))- l M T „ 



j=l,...,k 
j)i=l,...,fe 



where T n ,M T is as defined in Theorem\2_ 

From the results derived above, we see that the convergence rate of the estimators for binary 
choice probabilities corresponds to the r ate typ i cally encountered if one-dimensional smoothing is 



performed. Compared to the results of iKhanl (|2013l ) whose rates correspond to d— dimensional 
smoothing, this can be a ver y substantial improvement. While our assumptions are of course 
more restrictive than those of 



Khan 



()2013l ). the form of allowed heteroskedasticity is somewhat 
more general than the simple multiplicative heteroskedasticity or even homoskedasticity assumed 
in previous work. While we of course do not suggest to completely replace the methodologies 
developed in the literature, we feel that our approach can be considered as a good compromise 
between flexibility of the underlying model and convergence rates. It thus provides a valuable 
supplement and extension of available procedures. 
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4 Proofs 



Proof of Lemma 12.31 By Lemma 2.6.15 and Lemma 2.6.18 in Ivan der Vaart and Wellnerl ([1996), 
the classes of functions {(y,w) h-> yI{w T (3 > 0}\/3 G R d+1 } and {w h-> I{w T (3 > 0}\/3 G R d+1 } are 
VC-subgraph classes of functions. Together with Theorem 2.6.7 and Theorem 2.4.3 in the same 
reference this implies 



sup 

rG[0,l] tfelBd i 



|SrGS)-5n, T C9)|=o(l) a.s. 



(4.1) 



Next, observe that almost surely 



\S n ,rW) - S n , T (p)\ < sup \K{v/K) ~ Hv > 0}| + (1 + sup \K(v)\)- V I{\P T Wi\ < ch n }. 

\v\>c v n 



i=l 



Moreover, 



/{|/3 T ^| < chn} = HP T W l < ch n } - I{p T W t > -ch n }, 



and the classes of functions {w (->■ I{w T f3 < c}\/3 G R d+1 ,c G R}, i- » I{w T > c}\0 G R d+1 ,c £ 



are VC-subgraph by Lemma 2.6.15 and Lemma 2.6.18(viii) in in 



van der Vaart and Wellner 



(|1996l ). In combination with Theorem 2.6.7 and Theorem 2.4.3 from the same reference this implies 

I 1 - 

sup sup \-J"l{\f3 T Wi\ < ch n } -P(\f3 T Wi\ < ch n ) ^0 a.s. 
/3eM d+1 ceR ]n ~{ 

— 1/2 ~ ~ 

Setting c = c n = h n in the bound for \S n , T {P) — S n , T {P)\ we see that the first term, which is 
independent of /3, converges to zero by assumption (Kl). Moreover, by assumption (Fl) we have 
for p = (1,6 T ) T 



sup P{Z, + b T Xi\ < c n h n ) = sup / F z{x (-b T x + h n / 2 \x) - F z \ x {-b T x - h l J 2 \x)dP x {x) 
beK d beR d J 

= o(l) 



almost surely. A similar results holds for j3 = (—l,b T ) T . Combining all the results so far we thus 
see that 



sup 

" s=±l 



S n ,r((s,b 



T\T\ 



S T ((s,b 



T\T\ 



o(l) a.s. 



Finally, observe that almost surely 



SUP \\$ T ,n ~ Pt\\ < 2d(s n ) = o(l) 
reT 

where d(s) was defined in condition (Dl). To see that this is the case, observe that /3 Tj „ maximizes 
Sn,r(f3), and thus we have a.s. for every r G T 

< S n ,rWr) ~ S n ,r(Pr) < S T {fcr) ~ S T {^ T ) + 2s n , 

which implies sup r \S T (f3 T ) — S T (f3 T )\ < 2s n since S T (f3 T ) > S T (f3 T ) for all r G T. □ 
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Proof of Theorem 12.51 Define the quantities 



1 

S n (s,b,r) := -V(^-(1-t))£( 

i=l 



X[b + sZj 



(4.2) 



First, by uniform consistency of j3 T and given the fact that sq G {—1,1} we see that with probability 
tending to one s T = so for all r G T. Thus defining 

b T := &Tgmax bmd S n (s ,b,T) 

we see that P(b T = 6 T Vr G T) — > 1. Moreover, uniform consistency of 6 T implies that with 
probability tending to one it will satisfy sup TgT ||6 T — 6 r || < 77, and continuous differentiability of 
b h-> S n (s, 6, t) for 6 with ||6 — 6 T || < r\ implies that 

f n {s ,b T ,T) = VrGT. 

A Taylor expansion now yields that with probability tending to one 

T n (s ,6 r ,r) + Q n (s ,b* T ,T)(b T - b T ) = Vr G T 

where 

~ df n (s,b, t) 1 A ^ v vTT ^,( Xjb + sZ 

Qn(S,b,T) .= — = -p- 2^(rt - (1 - T))XiXi K [ 



i=i 



kr, 



(4.3) 



and b* = £ n (T)6 r + (1 — £ n (T))& r for some G [0, 1]. Rearranging we obtain 

Q(s Q ,b T ,T)(b T - b T ) = - f n (s ,b T ,T) + (Q n {s,b* T ,r) - Q(s,6 r ,r))(6 T - b T ). 



(4.4) 



Since ||6 T — 6 T ||oo = op(l) uniformly over r G T and since the same holds for T n (so, b T , r) [see 
Lemma l4.2j . there exists a 7 n — >• such that 



sup 



r n (s ,6 T ,r) + (Q n (s,6*,r) - Q(s,b T ,T))(b T - b T ) = o P {^ n ). 



By the conditions on Q(so,b T ,T) this implies sup T ||6 T — 6 T || = op(7„). By Lemma I4TT1 this in turn 
implies 

' Q n (s, b*, r) - Q(s, b T ,r) = P ({nhl)- 1 ' 2 log n) + 0{h n ) + o P { ln ). 



sup 

tGT 



Plugging this into (|4,4p and repeating this argument [note that every application yields an improve- 
ment of the bound until j n ~ sup T ||T n (so, b T , r)||] yields the assertion (|2.ip . 
For a proof of assertion (|2.2p note that for k ^ r and any 5 > we have as n — > oo 



E 



T n (s , 6 T , T) T T n (s , b K , k) 





1 


< 










1 


< 







A' 



ar b T + s z\ r ,/ar 6 K + sqz 



K. 



K 



kr 



f z \ x (z\x)dzdP x (x). 



< \ [ ( sup \K{a)\ + llAHooJlll^ll < \Pr,i\- x S) 

ntl n J V \a\yShn 1 

x ( sup \K(a)\ + \\KWnIiWv? p K \\ < \(3 K ,i \ ~ X 5}) dP w {w) 
\a\>Sh^ ' 
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The assumptions on K now imply that for any 5 > we have su Pi a i>$ft-i l-^( a )l = ( n n)- Thus it 
remains to consider the integral 

J \\K\\l I{\\W T ^\\ < \^ K>1 \- 1 S}I{\\W T ^\\ < \p Til \- l 5}dP w {w). 

By condition (D6), this integral equals zero if we choose 5 such that |/3 Kj i| -1 <5 and | /3-t-, i | 1 <5 are 



below the e specified in that condition. Thus E 



T n (s ,b r ,r) T f n (sQ,b K ,K) = o(nh n 1 ) and the 



covariance is negligible compared to the variance which is of order nh n . The rest of the proof 
follows by standard arguments and is omitted. □ 



Lemma 4.1 Under assumptions (Kl), (K2), (B), (D2), (D3), (D5) we have [a n was defined in 
assumption (K2)J 



sup sup 

t£T ||6-6 T ||<rj 

sup sup 

t£T ||6-5 T ||<r? 



E[Q n (s , b, t)] - Q(s , b, r) = 0(h n + a n ), 

oo 

Q n (s Q , b, t) - E[Q n (s , b, t)} = P {{nhl)- 1 ' 2 log n), 



(4.5) 
(4.6) 



where Q n (s,b,r) was defined in |^.3| ). Moreover, we have for any a n — > 



sup sup 

t£T ||b-5 T ||<a n 



Q(s Q ,b T ,T) - Q(s ,b,r) 



< Ca n 



(4.7) 



for a n small enough and some universal constant C. 

Proof We begin by considering assertion (|4.5p . Observe that 

E[Q n (s ,b,r)} = ^J J (r - F Y ^ z (0\x,z))f zlx (z\x)xx T K'{ xTb ^ SoZ yzdP x (x) 

= y J j ( r ~ F Y*\x,z(0\x,s (vh n - x T b))^f z ix(so(vh n - x T b)\x)xx T K'(v)dvdP x (x). 

The assertion now follow s from a Taylor e xpansion, the assumptions on K, and standard arguments 
similar to those given in iHorowita (|2009l ). For a proof of (j4.6|) note that for i, j = 1, d we have 



sup sup 

r&T \\b-b T \<ri 



Q n (s ,b,T)-E[Q n (s ,b,T)]) < h^n' 1 ' 2 sup |G„(/)| 



1,3 



where Tn denotes the n— dependent class of functions 



Fn j := {fn,b,r(x,y,z) = K' 



,(x T (b T + b) + s z 
K 



(y - (1 - T))xiX, 



be 



<V,t€T\ 



and 



i n (f) := n- 1 ' 2 YjifiXi, Yi, Zi) - E[/pQ, Y u Z,)]). 



i=i 
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Now uniform Holder continuity of K', uniform Holder continuity of r i— > b T , and uniform bounded- 
ness of x implies that for every sufficiently small 5 > we have for all |r — t'\ < 5, \\b — b'\\ < 5 for 
some 7 > 

C 

\\fn,b,r( x ,y,z) ~ fn,b',r'(x, ^)[|oo < T-(|t - r'| 7 + \\b - 6'|| 7 ) 

with C denoting some constant ind ependent of n, r, t', b, b' . T h is sho ws that for sufficiently small 
e the || • | loo-bracketing number [see Ivan der Vaart and Wellnerl (|1996l ). Chapter 2] of the class Fli 3 
is bounded by 

A/j ](e,J^ j , || • |U) < C^h-^+^e-^ 1 ^. 
Next, observe that for any r G T, ||6|| < 77 

r ,(x T (b T + b) + s z\ 2 



E[/ n \ r (x,y,z)] < c 



A" 



/?.., 



) dzdP x {x) = c \K'{zh- x )\ 2 dzdP x (x) 



= h n C / \K\z)\ l dz. 
Jn 

Combining this with Lemma I A . 1 1 yields 

sup |G„(/)| = P (h 1 J 2 log n), 

and thus the proof of (j4.6j) is complete. Finally, assertion (|4,7p follows by the smoothness properties 
of Fy*\x,z an d /z|x- Thus the proof is complete. □ 

Lemma 4.2 Under assumptions (K1)-(K3), (B), (D2), (D4), (D5) we have 

E[f n (s , b T ,r)] - T n (s , K, t) = o(h k n ), 

00 

f n (s ,b T ,T) -E[f n (s ,b T ,r)] =0 P {(nh n )- l ' 2 \ogn). 



sup 



sup 



(4.8) 
(4.9) 



Proof The proof of (|4.9p follows by arguments very similar to those used to establish (|4.6p and is 
therefore omitted. For the proof of (|4.8|) . note that 



1 



sqz ~\~ bj- N 

x(t - F Y *\x,z(0\x,z))K[ - )f z \x(z\x)dzdP x {x) 



E[T n (s A,T) 

l s o| / / - i ? y»ix,z(0|x,s (u/i n - 6 T )) )fz\x(s (vh n - x 1 b T )\x)K(v)dvdP x {x). 



+ 



\vh n \>ri 
J J \vh n \ 



xlr - F Y *\ x ,z(Q\x,so(vh n - x b T ))\ f z \ x {so{vh n - x b T )\x)K{v)dvdP x {x) 



x[t-P 



Y*\X,Z\ 



(0\x,s o (vh n - x T b T )))f z \ x {sQ{vh n - x T b T )\x)K(v)dvdP x {x). 



The order of the first integral is o(h^) by the assumptions on K. The assertion now follows by a 
Taylor expansion of the function 

u ^ (t - F Y *\x,z(0\x,s (u - x T b T )))fz\x(so(u - x T b T )\x), 
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which holds for |u| < rj the assumptions on K and standard arguments. □ 

Proof of Theorem 13.31 The statement (13. 5p is a direct consequence of the condition on the 
collection of functions g q and the uniform consistency in (|3.3p . 

The main technical ingredient for the remaining proof are the bounds provided in Lemma 14.31 
Consider an arbitrary sequence c n — > oo with (, n (RnCn) = o(R n ) and set r n := R n c n . Then, with 
probability tending to one we have for all q 6 Q 

**(0) " *s„n(0) = / H9 q (u) < 0} - I{g n , q (u) < 0}du 

and for all p > there exists no G N such that for all n > no we have sup g \g n ,q{uo,q)\ + 
2(gn(^)+c , Hr-n — ) < r ^ w ^]j probability at least 1 — p [this follows from ]. Applying the bound 
in Lemma 14.31 we thus see that for each q £ Q, as soon as r n < 5 and n large enough we have with 
probability at least 1 — p 



sup 

q 



/ < 0} - I{g n , g (u) < 0}du 

J Un.a—r-n. 



9 rr„^/nl rr„ /..A^nU. Sn,g(«0, 9 ) 



Si(«0,5)l 

2en(r„)+4C7 // rA" 



'"0,9— r n \i/q\ u '^,q 
„l+7 

" . j . l I f . , ! I ? ■ .' / / 

< sup 



q l^(«0,?)l 

Since p was arbitrarily small, the claim of the Theorem follows with r n instead of Rn. Since c n can 
converge to infinity arbitrarily slowly, the claim also holds with R n instead of r n . This completes 
the proof. □ 

Lemma 4.3 Consider functions g, h : [0, 1] — > R and assume that for some Uq € (0, 1) we 
have g(uo) = 0. Additionally, assume that g is continuously differentiable in a neighbourhood 
U$(uo) C (0, 1) and that its derivative is uniformly Holder continuous of order 7 with constant 
Ch- Define £(e) := sup| M _ M0 | <£ \h(uo) — h{u) — (g(uo) — g(u))\. Then for any e < 5 such that 

l%o)| + m£ ^?" n M +1) < z we have f° r l u o -e,u + e]c [a, b] 



ls'(«o)| 

b r-b 

I{g{u) < 0}du - / I{h(u) < 0}du - Ig'iuo^hiuo) 



2g(g) + 4C H e 1+ ^ 
\9'(uo)\ 
Proof. Rewrite 

I{h(u) < 0} = I{h(u ) - (g(u ) - g(u)) - (h(u ) - h(u) - (g(u ) - g(u)))} 
and observe that by the properties of g we have 

sup \g(u) - g(u ) - g'{u )(u - u )| < C H e l+1 . 

\u—uq\<e 

Thus the indicators I{h{uo) + g'(uo)(u — no) < 0, \u — uq\ < e} and I{h(uo) — (g(uo) — g(u)) — 
(h(uo) — h(u) — (g(uo) — g(u))), \u — uq\ < e} can only take different values on an interval with 
length at most 2(£(e) + C#£ 1+7 )/|</(uo)|. Thus we see that 



rUo+E i-uq+e 

/ l{h(u)<0}du- I{h(u ) + g'(u )(u-u ) <0}du 

I UQ—£ J Uq—E 



< 



2(£(e) + C H e l+ "<) 



\g'(uo) 
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Recalling that g(uo) = 0, similar arguments yield the bound 



uo+e 



u -e 



uo+e 



I{g{u) < 0}du - / I{g'{u )(u - uo) < 0}du 



UQ-E 



< 



2C H e 1+ ~< 

\g'(u )\ ' 



Finally, a simple computation shows that under the assumption |/i(uo)| + ^^77^7 — — < e we 
have 



2(g(e)+CW+^) 
l«/>o)| 



uo+e 



I{g'(u )(u - uo) < 0}du 



u -e 



u +e 



I{h{u ) + g'{u )(u - n ) < 0}du = j^rr- 



Thus the proof is complete. 

Lemma 4.4 Under the assumptions of Theorem \3.4\ we have for any r n = o(h n ) 



SUP SUP \\Pt w -Pt- (Pt w ~ fir) 
t w £.T 6 \T-r w \<r n 



□ 



= Op{n- l / 2 h-*l 2 r n logn) + o(h k n ) + P {K n ) 
where K n was defined in Theorem \2.5[ 

Proof From Theorem 12.51 we know that s T = sq for all r G T s with probability tending to one, 
and thus it suffices to find a bound for \\b T — b T — (b Tw — 6 Tm )||oo- Here we have for any r, t w G T s 

b T -b T - (b Tw -b Tw ) 
= -Q(so,6r,r)~ 1 T n (so,5 T ,r) + Q(s , b Tw , T w )~ l f n (s , b Tw , r w ) + i?„(r) + R n (T w ) 
= Q^oA™,"^)" 1 (f n (s ,b Tw ,T w ) - f n (s ,b T ,T)j 

+ (Q(s ,b Tw ,T w )~ 1 - Q(sq,I t ,t)~ 1 )f n (s ,b T ,T) + Rn(T) + R n (T w ). 



Combining condition (T2) with the results from Theorem 12.51 and Lemma 14.21 we see that the term 
in the last line is of order Op(((n/i n ) -1 / 2 logra + /i^) 1+7 + K n ). For the first term, note that 



sup sup 

T W £T S \r-T w \<r n 



T n (s ,b Tw ,T w ) - f n (s ,b T ,T) 



< D hn + D 2 , n 



where 



Di >n := sup sup h n x n 1/2 |G n (/) - G n {g)\, 

j=1 '-' d \\f-ah<h- 1/2 r n J,g€T n , 3 



D 2 ,n ■= SUp SUp 

t w GT s \r-T w \<r n 



E[T n (s ,b Tw ,r w ) - T n (s ,b T ,T)} 
M ■= -J2(f(X i ,Y i ,Z i )--E\f(X i ,Y i ,Z i )]), 



t=l 



and the classes of functions T n j are given by 



{(z,x,y) 



^ K 



x T (b T + b) + sqz s 



) Xj {y - (1 - r))\b G R d , \\b\\ <i),rer| 
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In order to see that the representation for D\ )Tl is true, note that 



f n (so,b Tw ,T w ) - f n (s ,b T ,T 

>xn T + So z, 



nh n * 



»=i 



hr, 



(Yi-(l-r))-K 



x 7k w + s Zj 



(Yi-(l-T w )))Xi 



In particular we have for n large enough 

'xn T + So z,, 



K 



< sup 
M<i 



K 
K'(v + 



(Y t -(l-r))-K 



(Yi-(1-T w )) 



Xfb T + s Z^ \Xfb T -Xfb Tw \ 

- h SUp \K {V)\\t - T u 



h r , 



This s hows that for sufficiently small e the || • ||oo-bracketing number [see Ivan der Vaart and Wellner 
()1996l ). Chapter 2] of the class F n j is bounded by 

M[ ](e,T n}j , || • lU) < Ch-^he-^h. 

Moreover, the above bound implies that for |r — t w \ < r n we have for n large enough 



sup sup 

w \r-r w \<r n 

< ch- l /\ n . 



K 



Xjb T + s Zj 
h n 



(Y t -(l-r))-K 



x Tb Tw + s Zj 
h n 



(Yi - (1 - T W )) 



Applying Lemma fA. II to the classes of functions {/ — g\f,g G T n ,j, \\f ~ d\\ — Ch n r n } thus shows 
that D n>1 = Op(n~ l / 2 hn Z ^r n log n). 

Next, consider D n 2- By the results in Lemma 14.21 we have 



D 



n.2 



sup 

t 



k 



du k 



(( 



T-F, 



Y*\X,Z\ 



\x, s (-x T b T + u)))f z \x(so{-x T b T + u)\x) 



-(t w - F Y *\x,z(°\ x > s o(-x b Tyj +u)))f zl x(s (-x 1 b Tw +u)\x)) +o{h k n ). 



u=0 



From condition (D4) we see that the left-hand side in the above expression of of order o{h\\). Thus 
the proof is complete. □ 



A Technical details 

Lemma A.l Assume that the classes of measurable functions T n consist of uniformly bounded 
functions (by a constant not depending on n). If additionally 

N {] (T n ,e,L 2 (P))<Cn a e- a 

for every e < 5 n , some a > and constants C, b not depending on n, then we have for any 5 n ~ n~ b 
with b < 1/2 

y/Ti sup ( / fdP n - [ fdP) = 0* P (S n (\ log 5 n I + log nj) . 

feT n ,\\f\\p,2<Sn K J J ' 
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Proof. Start by observing that the uniform boundedness of elements of T n by D implies that 
F = D is a measurable envelope function with L2-norm D. Note that for rj n sufficiently small 



a(r] ri 



Vn 



< n D/Jl + logNn(rj n D,T n ,L 2 (P)) > D Vn /y/l + log C + b log n - a \og{D Vri 



> DCr] n /y/\ \og7] n \ +log 



for some finite constant C depen ding only on a,b,C,D. Thus the bound in Theorem 2.14.2 in 



van der Vaart and Wellnerl (|1996l ) yields for 5 n sufficiently small 



E 



sup / fda n 



< DJ {] {8 n ,P n ,L 2 (P)) + ^i / F(u)I{F(u) > Vn~a(5 n )}P(d 



u 



< DC 



log el + log nde + D 



D > 



< DC 2 5 n {\ \og5 n \ + \ogn) + D^ll 1 > 



log<5 n | + logn. 
Cy/n5 n \ 



log <5„ | + logn. 



where a n := ^Jn{P ri — P), P n denotes the empirical measure, and Ci,C 2 are some finite constants. 
Here, the second inequality follows by a straightforward calculation and the first inequality is due 
to the fact that for 5 n sufficiently small by definition 

J Q (5 n ,P n ,L 2 (P)) = J " yjl + log iV D (eD, T n , L 2 {P))de <C 1 J^\\oge\+ log nde. 

Now under the assumption on 5 n , the indicator in the last line will be zero for n large enough and 
thus the proof is complete. □ 
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